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(57) ABSTRACT 

The present invention relates to a method in a computer 
system, for configuring a memory subsystem, comprising 
selecting a subset of main memory, integrating the'subset of 
main memory within the computer system such that the 
subset - is^:ph5^ically"distiii[ct from thet main inembry and 
corifiguririg^4h^ subset of main memory as riohdacheable 
memory. 

20 Claims, 4 Drawing Sheets 



<310 



PROCESSOR 








U 


^ -312 




CAOt 





r 



S50 



^360 



j^352 



INTBFACe ' ^^390 




10/16/2003, EAST version: 1.04.0000 



U.S. Patent May 28, 2002 Sheet 1 of 4 US 6,397,299 Bl 



10 



-50 



U CACHE f ^^^ 



PROCESSOR 



20 



H0STBU5 



12- 



LI 
CACHE 



1 

HOST 




w 

EMBEDDED 




w 

PERIPHERAL 


INTERFACE 




PERIPHERAL 




INTERFACE 



SEL 

L 



■70 



V 



40 



PERIPHERAL BUS 



54 
J SEL 



SEL 



54 



'80 



REQUEST/DATA MULTIPLEXER 



MAIN 
MEMORY 
CONTROLLER 
W/DECODE 



IE 



no 



LOWUTENCY 
mORY CONTROLLER 



MEM- 
ORY 

BUS 



■92 



54 



LOWLATENCY 
mORY 



'130 



SYSTEM CONTROUER 




■100 



PERIPHERAL 
DEVICE 



30 



FIG.l 



10/16/2003, EAST version: 1.04.0000 



U.S. Patent 



May 28, 2002 



Sheet 2 of 4 



US 6,397,299 Bl 



1^ 



5; 



5 




00 




5>* 



X 

I 



1 
I 

'?5 



5fe 



2 ^ ^ 
^ ^ 



10/16/2003, EAST version: 1.04.0000 



U.S. Patent May 28, 2002 Sheet 3 of 4 



US 6,397,299 Bl 



310 



pRocesscm 








Ll 






acHE 





'350 



3130 



3110 



MEMORY 



A HOST 
i-y INTERFACE 



LOW yLA 

CONTROLLER 




'370 



pa 

INTERFACE 



Z 



330 



pa 

DEVICE 



FIG. 3 



10/16/2003, EAST version: 1.04.0000 



U.S. Patent May 28, 2002 Sheet 4 of 4 



US 6,397,299 Bl 




10/16/2003, EAST version: 1.04.0000 



us 6,3^ 

1 

REDUCED LATENCY MEMORY 
CONHGURATION METHOD USING NON- 
CACHEABLE MEMORY PHYSICALLY 
DISTINCT FROM MAIN MEMORY 

'111 is application is related to, and incorporates by 
reference, an application titled "System Controller with 
Integrated Low Latency Memory" filed on even date 
herewith, Ser. No. 09/010,250. 

1. FIELD OF THE INVENTION 

The present invention relates generally to memory sub- 
systems in electronicd eyices. Mo re, partimlarly^ thft pVf^t 
invention relatqrt oTeducing laten cy in memory subsystem^* 

^"■'"TbACKGROUND of the INVENTION 

Computer systems typically comprise at least one 
processor, a memory subsystem, at least one system con- 
troller and one or more peripherals (such as PCI devices) 
operably connected by various buses, including a host bus 
operably connected between the processor and the system 
controller. The processor may include an internal level one 
(LI) cache. The memory subsystem typically comprises 
system or main memory external to both the processor and 
the system controller and a level two (L2) cache internal to 
the system controller. Together, the LI cache and the 
memory subsystem (L2 cache and main memory) comprise 
a memory hierarchy. 

The system controller includes logic for, in conjunction 
with the processor and peripheral devices, controlling the 
transfer of data and information between the processor and 
peripheral devices and the memory subsystem. For example, 
if a processor issues a read transaction, the processor will 
determine whether the requested data is stored in the LI 
cache. If the read request is a "miss" in the LI cache, during 
a subsequent clock cycle, the system controller will deter- 
mine whether the requested data is stored in the L2 cache. 
If the read request is a miss in the L2 cache, during yet 
another subsequent clock cycle, the system controller will 
attempt to access the requested data in the main memory. At 
this point, given the relatively larger size of main memory, 
the slower speed of main memory, and the distance of main 
memory from the CPU, a number of clock cycles may be 
required decode the address of the read request and access 
the requested data in the main memory. 

Thus, when accessing main memory (after LI and L2 
cache misses), the computer system experiences a relative 
degree of latency. ITiis latency may be increased in multi- 
processor/multi-controller systems, wherein each processor 
and each system controller may have a respective LI and I^ 
cache. In order to preserve coherency between the respective 
LI and L2 caches and the main memory, respective LI and 
L2 cache controllers must monitor buses within the com- 
puter system (typically the host bus) to determine if another 
processor or peripheral device has modified data in an LI 
cache, L2 cache or main memory. If modifications have been 
made, the caches and main memory must be updated accord- 
ingly. Monitoring the memory hierarchy in this manner may 
be referred to as snooping. A snoop operation requires at 
least one clock cycle to perform, thus adding to the relative 
degree of latency within these types of computer systems. 

To deal with the latency (i.e., to prevent transactions that 
may "interfere" with the memory access request until the 
memory access request has been completed), the computer 
system may interrupt, stall or insert a number of wail states 
into various operations and transactions. This results in a 
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relatively slower computer system with relatively slower 
processing and reduced computer system throughput. Oper- 
ating such a computer system is relatively time consuming 
and costly. 

5 Thus, there exists a need in the art for apparatus and 
methods for reducing the inherent latency in accessing 
memory subsystem. 

In still other computer systems, a system controller may 
have an internal or "embedded" peripheral. In these com- 
puter systems, the embedded peripheral is an integral com- 
ponent of the system controller. The embedded peripheral 
may be a "secondary" processor (i.e., a processor without 
the power, capabilities and intelligence of the main or 
external processor) and may be utilized to relieve the 
computational burden on the main processor. Because these 
embedded peripherals lack the sophistication of the main 
processor (or, for that matter, most external peripherals), in 
current computer systems, the embedded peripheral cannot 
access the memory subsystem. As such, in current computer 
systems, the embedded peripheral mxist be provided with a 
dedicated memory exclusively utilized by the embedded 
peripheral. In current computer systems, this embedded 
peripheral dedicated memory is external to the system 
controller or "off chip". Providing this dedicated memory 
"off chip" adds latency to embedded peripheral's memory 
accesses and consumes valuable space within the computer 
system. Additionally, the exclusivity of the dedicated 
memory decreases the versatility of the computer system. 

Thus, there exists a need in the art for apparatus and 
methods for reducing latency in embedded peripheral dedi- 
cated memory accesses and for increasing the^versatilityjDf 
embedded peripheral dedicated memory. 

3. SUMMARY OF THE INVENTION 

35 

The present invention relates to a method in a computer 
system, for configuring a memory subsystem, comprising 
selecting a subset of main memory, integrating the subset of 
main memory within the computer system such that the 
40 subset is physically distinct from the main memory and 
configuring the subset of main memory as noncacheable 
memory. 

4. BRIEF DESCRIPTION OF THE DRAWINGS 

45 

For a more complete understanding of the present 
invention, and the advantages thereof, reference is now 
made to the following description taken in conjunction with 
the accompanying drawings, in which: 

FIG. 1 is a block diagram of a computer system that uses 
the present invention. 

FIG. 2 is a block diagram of the low latency memory 
controller illustrated in FIG. 1. 

FIG. 3 is a block diagram of another computer system that 
55 uses the present invention. 

FIG. 4 is a block diagram of yet another computer system 
that uses the present invention. 

5. DETAILED DESCRIPTION OF THE 
60 INVENTION 
5.1 Description of a First Embodiment 

In FIG. 1, there is shown a computer system 1 comprising 
an embodiment of the present invention. Generally, RG. 1 
illustrates a computer system 1 comprising a processor 10, 
65 a system controller 50 with an integrated low latency 
memory 130, a peripheral device 30 and a main memory 
100. The low latency memory 130 may be considered a 
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subset of the address space primarily embodied in main handle and schedule multiple requests from various buses 
memory 100, although within the computer system 1 it is via bus arbitration control circuitry (not shown), 
physically distinct from the main memory 100. Unlike a The various integrated components of the system control- 
cache, the low latency memory 130 is not intended ler 50 may be opcrably connected to an internal system 
2(ny portidcycii^ maifi meriSiory ' WO. Instead at represents a ' 5 controller bus 54. The internal system controller bus 54 may 
unique subiset of the main mieniory. Accordingly, in the have its own proprietary or a standard bus protocol, 
present invention, the low latency memory 130 is a unique ^ 5.1.4.1 Host Interface, I/O Interface and Request/Data 
component of the memory subsystem. Multiplexer 

5.1.1 Processor The system controller 50 comprises a host interface 60, a 
FIG. 1 illustrates a uni-processor computer system, lo peripheral interface 70 and a request/data multiplexer 80. 

although the present invention may be equally beneficial in The system controller may also comprise an L2 cache 52. 

multi-processor computer systems. The processor 10 may be The host interface 60 may receive data and address infor- 

any conventional general purpose single- or multi-chip mation from the processor 10 over the host bus 20. The host 

processor such as a Pentium® Pro processor, a Pentium® interface 60 may decode an address received from the 

processor, a 8051 processor, a MIPS® processor, a Power is processor 10 to determine if the requested address is within 

PC® processor, or an ALPHA® processor. In addition, the main memory 100 or within low latency memory 130, If the 

processor 10 may be any conventional special purpose requested address is within the low latency memory 130, the 

processor such as a digital signal processor or a graphics host interface 60 may assert a select signal (SEL) causing the 

processor. The processor 10 may have an integrated level request/data multiplexer 80 to provide the address request 

one (LI) cache 12. As shown in FIG, 1, the processor 10 may 20 and any data associated with the address request to the low 

be operably connected to a host bus 20. When the processor latency memory controller 110. Otherwise, the address 

10 accesses the memory subsystem (or other portions of the request and any data associated with the address request may 

computer system), the processor 10 may be referred to as a be provided to the main memory controller 90. 

requesting agent. Similarly, the peripheral interface 70 may receive data and 

5.1.2 Peripheral Device 25 address information from the peripheral device 30 over the 
FIG, 1 illustrates a computer system with a single periph- peripheral bus 40. llie peripheral interface 70 may decode 

eral device 30, although the present invention may be an address received from the peripheral device 30 to deter- 
equally beneficial in computer systems comprising a plural- mine if the requested address is within main memory 100 or 
ity of peripheral devices. The peripheral device 30 may be within low latency memory 130. If the requested address is 
a PCI-based device or an other type of I/O device. As shown 30 within the low latency memory 130, the peripheral interface 
in FIG. 1, the peripheral device 30 may be operably con- 70 may assert a select signal (SEL) causing the request/data 
nected to a peripheral bus 40. When the peripheral device 30 multiplexer 80 to provide the address request and any data 
accesses the memory subsystem (or other portions of the associated with the request to the low latency memory 
computer system), the peripheral device 30 may also be controller 110. Otherwise, the address request and any data 
referred to as a requesting agent. 35 associated with the request may be provided to the main 

5.1.3 Main Memory memory controller 90. 

The main memory 100 may be one or more conventional 5.1.4.2 Embedded Peripheral 

memory devices including, without limitation, dynamic ran- The embedded peripheral 140 may be a digital signal 

dom access memories (DRAMs), extended data out DRAMs processor (DSP) such as a 56000 series DSP manufactured 

(EDO DRAMs), burst extended data out DRAMs (BEDO 40 by Motorol a''". When the embedded peripheral 140 accesses 

DRAMs), static random access memories (SRAMs), video the memory subsystem (or other portions of the computer 

random access memories (VRAMs), read-only memories system), the embedded peripheral may also be referred to as 

(ROMs), electrically erasable programmable read-only a requesting agent. The embedded peripheral 140 may be 

memories (EEPROMs), and erasable programmable read- utilized to relieve the computational burden on the processor 

only memories (EPROMs). The memory device may be 45 10. The embedded peripheral 140 may assert a select signal 

provided in multi-chip modules (e.g., SIMM or SIP). The (SEL) to access the low latency memory 130. Because all of 

main memory 100 may be cached or cacheable memory; i.e., the data and information required by the embedded periph- 

portions of the data or information stored in the main eral may be stored in low latency memory 130 (that is, the 

memory 100 may also be stored in the LI cache 12 of the embedded peripheral 140 will have no off chip or other 

processor 10 or in the L2 cache 52 of the system controller 50 memory space available to it), the select signal line of the 

50. Because the main memory 100 is cacheable, a snoop embedded peripheral may be hardwired to the low latency 

phase or cycle must be implemented whenever a processor memory controller 110, thereby allowing the embedded 

10 or peripheral device 30 attempts to access a main memory peripheral direct access to the low latency memory 130. 

address. The main memory 100 may be operably connected latency in direct memory accesses to the low latency 

to the system controller 50 by a main memory bus 92. 55 memory 130 is reduced by having the memory on which the 

5.1.4 System Controller with Low Latency Memory embedded peripheral 140 is exclusively dependent on the 
The system controller 50 may also be referred to as same "chip" or physical component as the embedded periph- 

systcm or core logic 50. The system controller 50 may be an eral 140, In the present invention, the embedded peripheral 

application specific integrated circuit (ASIC). Generally, the 140 and the low latency memory 130 may be on the same 

system controller 50 operates to control the memory sub- 60 system controller ASIC 50. 

system within the computer system (including the main 5.1.4.3 Main Memory Controller 

memory 100 and the low latency memory 130) in response The main memory controller 90 generates the control 

to memory access requests received by the system controller signals necessary to control the main memory 100 in 

50. The system controller 50 coordinates the transfer of data response to main memory access requests provided by the 

to and from the main memory 100 and the low latency 65 request data multiplexer 80. 'ITie main memory controller 90 

memory 130 across the host bus 20, peripheral bus 40 and may perform address decoding operations to determine the 

memory bus 92. Generally, the system controller 50 may location in main memory 100 of the access request. 
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5.1.4.4 Ijow Latency Memory Controller 

The low latency memory controller 110 generates the 
control signals necessary to control the low latency memory 
130 in response to low latency memory access requests 
provided by the request data multiplexer 80. As shown in 
FIG. 2| the4o>y4aten9;^memor^;e^ a 
request bus conlr^llt^ 114, a memory seqiiehcef 118 and an 
opti(JMM^i2f|fcruffgr^ 

5.1.4.5 Low Latency Memory 

The low latency memory 130 together with the L2 cache 
52 and the main memory 100 comprise the memory sub- 
system. The low latency memory 130 may be one or more 
conventional memory devices including, without limitation, 
dynamic random access memories (DRAMs), extended data 
out DRAMs (EDO DRAMs), burst extended data out 
DRAMs (BEDO DRAMs), static random access memories 
(SRAMs), video random access memories (VRAMs), read- 
only memories (ROMs), electrically erasable programmable 
read-only memories (EEPROMs), and erasable program- 
mable read-only memories (EPROMs). In one embodiment, 
the low latency memory may be 1 megabyte. 

At The low latency memory 130 is a subset of the address 
space primarily embodied in the main memory 100; 
however, the low latency memory 130 is an integral com- 
ponent of the system controller 50 and thus physically 
distinct from the main memory 100. Because the low latency 
memory 130 is an integral component of the system con- 
troller 50 (i.e., integrated on the same chip), latency is 
reduced in accessing the low latency memory 130 as com- 
pared to accessing the main memory 100 (which is external 
to the system controller 50). (Generally, transmitting a signal 
from one chip or computer system component to another 
results in latency because of the propagation delay involved 
in transmitting the signal). Latency is further reduced in 
accessing low latency memory 130 as compared to access- 
ing main memory 100 because, given the relatively smaller 
size of the low latency memory 130, address requests to the 
low latency memory 130 may require fewer clock cycles to 
decode. Latency may be even further reduced in accessing 
low latency memory 130 by configuring the low latency 
memory as noncacheable memory, thus avoiding the need to 
snoop the LI cache, L2 cache or low latency memory 130 
for data modifications when low latency memory 130 is 
addressed. Additionally, configuring or manufacturing the 
low latency memory as SRAM (as compared to main 
memory DRAM) may also reduce latency). 
5.2 Alternative Embodiments 

In FIGS. 3 and 4, there are shown ahernative embodi- 
ments of computer systems comprising the present inven- 
tion. Briefly, in FIG. 3, all of the address decoding functions 
may be performed in a host interface 360. Thus, access 
requests issued by a processor 310 or a PCI device 330 are 
provided to the host interface 360 for decoding and accord- 
ingly routed to a low latency memory controller 3110 or to 
main memory controller 390. 

Again, in FIG. 4, all of the address decoding functions 
may be performed in a host interface 460. In this 
embodiment, however, all routing of address requests and 
data associated with address requests is performed by a 
central switch or router 4200. 

In both of these alternative embodiments the low latency 
memory 3130 (FIG. 3) or 4130 (FIG. 4) is integrated on the 
same chip as the other components of the system controller 
350 (FIG. 3) or 450 (FIG. 4). llius, the low latency benefits 
afforded by such a configuration (as discussed above) are 
essentially equally available in these embodiments. 
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5,3 Remarks 

It may be seen that one advantage of the present invention 
is an increase in computer system throughput. By configur- 
ing the memory subsystem such that a subset (i.e., the low 

5 latency memory 130) of the main memory 100 is an integral 
component of the system controller 50 and physically dis- 
tinct from the main memory 100, and such that this subset 
of main memory 100 is noncacheable, latency in accessing 
that subset and average latency of the entire memory sub- 

10 system is reduced. By reducing latency in accessing the 
memory subsystem, the number of computer operations or 
transactions that are interrupted, stalled or have wait states 
inserted is reduced. This may result in time and cost savings. 
It may be seen that another advantage of the present 

15 invention is an increase in the versatility of the memory 
subsystem. In the present invention the low latency memory 
130 is accessible by not only the embedded peripheral 140, 
but also the processor 10 and external peripheral 30. 
The present invention may provide particular advantages 

20 in the following computer operations or tasks. 

In some computer systems of the general kind shown in 
FIG. 1, a peripheral device (such as a PQ -based device) may 
assert a "busy bit" indicating that the peripheral is perform- 
ing a transaction. This busy bit is typically stored in a 

25 noncacheable memory space within the main memory. The 
processor 10 must periodically access or poll the address of 
the busy bit to determine when the peripheral device has 
completed the transaction. As discussed, each of these 
repeated accesses to memory has inherent latency, thereby 

30 reducing computer system throughput. Additionally, the 
repeated polling of the main memory increases main 
memory bus utilization which effectively reduces the band- 
width of the main memory bus. By storing the busy bit in the 
low latency memory, the latency in these processor polling 

35 operations may be reduced and the main memory bus may 
be more effectively utilized. 

In other computer systems, the busy bit or a similar 
indication that the peripheral device is performing a trans- 
action (referred to as a semaphore) may be stored in the L2 

40 cache. This scheme eliminates the need for the processor to 
access main memory when polling the status of the periph- 
eral. However, because the semaphore is stored in L2 cache, 
a snoop phase must be implemented which adds latency to 
the system. By storing the semaphore in the low latency 

45 memory, the snoop phase may be eliminated and latency 
may be reduced. 

In still other computer systems, because the processor and 
the peripheral may be concurrently "competing" for access 
to the main memory, buffering may be provided at the 

50 peripheral for storing data and information while the pro- 
cessor has access to the main memory bus. The amount of 
buffering at the peripheral must compensate for the inherent 
latency involved in main memory accesses by the processor. 
In other words, the more latency inherent in a main memory 

55 access, the more buffering that will be required at the 
peripheral. Thus reducing latency in a subset of main 
memory accesses, may result in a corresponding reduction 
in the amount of buffering required at a peripheral. 

It will be appreciated by those of ordinary skill in the art 

60 that numerous variations of the present invention will be 
possible without departing from the inventive concept 
described herein. Accordingly, it is the claims set forth 
below, and not the foregoing description, which define the 
exclusive rights claimed in this application. 

65 What is claimed is: 

1. In a computer system, a method for operating a memory 
subsystem that implements a general purpose address space 
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for a CPU and at least one other memory access requesting selecting a unique subset of the address space to make low 

agent of the computer system, comprising: latency as compared to the latency of a main memory; 

selecting a unique subset ofthe address space to make low integrating the low latency subset of the address space 

latency as compared to the latency of a main memory; ^^^^in the computer system such that the low latency 

. 1 „ .... 1 I ♦ u » r*u 5 subset is non-cacheable and physically distinct from the 

implementmg the low latency subset of the address space ^ implementing a majority of the address 

m physical storage withm the computer system that is ^'^^-^^ j^^^^^^^ associated cache that 

physically distmct from the main memory that imple- ^^^^^^^ ^ ^^^^ ^^^^ ^ ^^^^ 

ments a majority of the address space and that has at ^^^^^^ ^^^^^^^^ ^^e low latency subset has lower 

least one assoaated cache that requires a snoop phase; average decode latency than a memory access request 

configuring the low latency subset as non-cacheable directed to the main memory; 

memory; and initiating a memory access request, 

addressing the low latency subset using the same address determining whether the memory access request is 

bus as main memory but excluding the snoop phase. directed to the low latency subset; and 

2. The method of claim 1, further comprising implement- 15 responsive to the determination, accessing jhe low latency 
ing the low latency subset as DRAM. subset in memory that is physically distinct from the 

3. The method of claim 1, further comprising implement- memory without using a snoop phase. 

ing the low latency subset as SRAM. ^* The method of claim 8, further comprising storing in 

4 The method of claim 1, wherein the act of implement- the low latency subset data that is subject to polling by a 
ing the low latency subset within the computer system 20 P^o^^ssor. 

comprises integrating the low latency subset on a system 10. The method of claim 8, fiirther comprising imple- 

cont roller. menting the low latency subset as DRAM. 

5. A method for improving the average latency for access U* The method of claim 8, further comprising imple- 
to a memory subsystem that implements a general purpose menting the low latency subset as SRAM. 

address space for a CPU and at least one other memory 25 method of claim 8, further comprising imple- 

access requesting agent, comprising: menting the low latency subset as EDO DRAM. 

selecting a unique subset ofthe address space to make low ^^^^ "^^^^^^ ^^^^"^ ^"^^er comprising imple- 

latency as compared to the latency of a main memory; °^^°tmg the low latency subset as BEDO DRAM. 

.... r ■ ' r L 14- The method of claim 8, further comprising imple- 

providing the mam memory for the maionty of the „ »u 1 1 * u * xm^^jt 
^ -^u 1 • J t. u m mentmg the low latency subset as VRAM, 

address space with at least one associated cache that 1 « li •„ o a -.u _ • • • 1 

h . 15- method 01 claim 8, further comprising imple- 

requires a snoop p ase, menting the low latency subset as ROM. 

implementing the low latency subset of the address space 15. jhe method of claim 8. further comprising imple- 

in physical storage within the computer system such menting the low latency subset as EEPROM. 

that a first address request issued to the low latency 17. The method of claim 8, further comprising imple- 

subset will on average have a lower decoding latency menting the low latency subset as EPROM. 

than a second request that is issued to main memory; ig. xhe method of claim 8, further comprising integrating 

the low latency subset as part of a system controller, 

configuring the low latency subset as non-cacheable 19. The method of claim 8, wherein the computer system 

memory requiring no snoop phase for its addressing. comprises an embedded peripheral, and the method further 

6. The method of claim 5, further comprising implement- comprises limiting storage of data and information required 
ing the low latency subset as DRAM. by the embedded peripheral to storage in the low latency 

7. The method of claim 5, further comprising implement- subset. 

ing the low latency subset as SRAM. 20. The method of claim 19, further comprising integrat- 

8. A method for accessing a memory subsystem that ing the embedded peripheral and the low latency memory on 
implements a general purpose address space for a CPU and the same chip. 

at least one other memory access requesting agent of a 

computer system, comprising: ♦ * * ♦ ♦ 
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